A Dive into the Global Housing Market from 2015-2024¶

  • Student: Lewis King
  • ID: 000051992
In [2]:
#This code imports the Libraries needed to open the data frame and visualize it in graphs. 
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from matplotlib.animation import FuncAnimation
In [3]:
#This will allow pandas to read and open the dataset
df = pd.read_csv("global_housing_market_extended (2).csv")
df
Out[3]:
Country Year House Price Index Rent Index Affordability Ratio Mortgage Rate (%) Inflation Rate (%) GDP Growth (%) Population Growth (%) Urbanization Rate (%) Construction Index
0 USA 2015 117.454012 116.550001 9.587945 4.493292 1.514121 -0.752044 -0.796707 85.985284 118.089201
1 USA 2016 150.807258 51.440915 11.729189 5.662213 1.880204 -0.545400 -0.358084 69.127267 111.980515
2 USA 2017 123.194502 70.386040 8.506676 2.197469 2.398940 0.930895 0.596245 83.555279 85.973903
3 USA 2018 131.423444 91.469020 3.418054 4.537724 1.608407 -1.479587 2.321099 88.968961 134.671788
4 USA 2019 110.461377 56.837048 9.158097 3.700762 1.293249 1.961415 -0.879640 87.279612 90.702399
... ... ... ... ... ... ... ... ... ... ... ...
195 UAE 2020 160.153711 117.170365 7.390686 2.048681 4.061737 1.635019 1.955250 62.942477 109.059291
196 UAE 2021 95.004866 72.727315 9.636214 3.880091 2.943274 1.155810 0.608064 83.550496 141.366775
197 UAE 2022 175.533469 105.083237 5.838662 4.940674 3.344420 0.037365 1.943051 61.152790 142.140959
198 UAE 2023 126.147746 94.604103 8.934185 5.975589 4.638353 2.911469 -0.766718 75.552241 82.013520
199 UAE 2024 153.743377 85.855534 9.122050 1.708365 1.051148 3.730587 -0.747705 62.137702 70.968678

200 rows × 11 columns

In [4]:
#This function shows the shape of the dataset 
df.shape
Out[4]:
(200, 11)

The shape shows that the dataset has 200 rows and 11 columns.

In [6]:
#This function is going to display the first 5 rows of the dataset.
df.head()
Out[6]:
Country Year House Price Index Rent Index Affordability Ratio Mortgage Rate (%) Inflation Rate (%) GDP Growth (%) Population Growth (%) Urbanization Rate (%) Construction Index
0 USA 2015 117.454012 116.550001 9.587945 4.493292 1.514121 -0.752044 -0.796707 85.985284 118.089201
1 USA 2016 150.807258 51.440915 11.729189 5.662213 1.880204 -0.545400 -0.358084 69.127267 111.980515
2 USA 2017 123.194502 70.386040 8.506676 2.197469 2.398940 0.930895 0.596245 83.555279 85.973903
3 USA 2018 131.423444 91.469020 3.418054 4.537724 1.608407 -1.479587 2.321099 88.968961 134.671788
4 USA 2019 110.461377 56.837048 9.158097 3.700762 1.293249 1.961415 -0.879640 87.279612 90.702399
In [7]:
# This function is going to output the general information of the Dataset
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Country                200 non-null    object 
 1   Year                   200 non-null    int64  
 2   House Price Index      200 non-null    float64
 3   Rent Index             200 non-null    float64
 4   Affordability Ratio    200 non-null    float64
 5   Mortgage Rate (%)      200 non-null    float64
 6   Inflation Rate (%)     200 non-null    float64
 7   GDP Growth (%)         200 non-null    float64
 8   Population Growth (%)  200 non-null    float64
 9   Urbanization Rate (%)  200 non-null    float64
 10  Construction Index     200 non-null    float64
dtypes: float64(9), int64(1), object(1)
memory usage: 17.3+ KB

The output above shows the general information of the data showing that there are 11 different columns, null value count, and the data type. This data set includes numerical data and categorical data but mostly numerical.

In [9]:
# This code shows if there are null values in any of the columns and finds the sum of the null values per column
df.isnull().sum()
Out[9]:
Country                  0
Year                     0
House Price Index        0
Rent Index               0
Affordability Ratio      0
Mortgage Rate (%)        0
Inflation Rate (%)       0
GDP Growth (%)           0
Population Growth (%)    0
Urbanization Rate (%)    0
Construction Index       0
dtype: int64

The output above shows that there are no missing values to deal with in this dataset. However, if there were missing values I would first copy the dataset into a new file then using the new dataset I would replace the null values with the statement "Unknown" for a categorical column, and for numerical columns, I would find the average of the column and replace the missing values with the average.

In [11]:
# This will check for duplicates and find the sum of the number of duplicates
duplicates= df.duplicated().sum()
print("Number of duplicates:", duplicates)
Number of duplicates: 0
In [12]:
# This will display a list of all the numerical columns and provide statical data on them 
df.describe()
Out[12]:
Year House Price Index Rent Index Affordability Ratio Mortgage Rate (%) Inflation Rate (%) GDP Growth (%) Population Growth (%) Urbanization Rate (%) Construction Index
count 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000
mean 2019.500000 130.381022 83.048370 7.237768 4.150621 3.649756 2.133722 0.722806 74.769891 111.204257
std 2.879489 28.752229 21.439858 2.576085 1.380222 1.881938 2.413270 1.035691 8.734246 24.309473
min 2015.000000 80.552212 50.354311 3.041688 1.537814 0.532110 -1.921833 -0.961391 60.172760 70.968678
25% 2017.000000 104.142562 60.466671 5.034207 3.045278 1.939199 -0.095628 -0.183320 66.923842 90.182198
50% 2019.500000 129.193653 83.721711 7.375697 4.329643 3.664625 2.307548 0.722372 75.100060 110.593807
75% 2022.000000 157.127098 100.604665 9.276196 5.217706 5.261720 4.272776 1.621317 82.677430 133.780445
max 2024.000000 179.971767 119.855388 11.879671 6.485623 6.912349 5.958931 2.497948 89.788944 149.735748
In [13]:
df_new=df.copy()
In [14]:
# This will remove' (%)' from the column names in the DataFrame to make it easier to code with. 
df_new.columns = df_new.columns.str.replace(' (%)', '')
df_new
Out[14]:
Country Year House Price Index Rent Index Affordability Ratio Mortgage Rate Inflation Rate GDP Growth Population Growth Urbanization Rate Construction Index
0 USA 2015 117.454012 116.550001 9.587945 4.493292 1.514121 -0.752044 -0.796707 85.985284 118.089201
1 USA 2016 150.807258 51.440915 11.729189 5.662213 1.880204 -0.545400 -0.358084 69.127267 111.980515
2 USA 2017 123.194502 70.386040 8.506676 2.197469 2.398940 0.930895 0.596245 83.555279 85.973903
3 USA 2018 131.423444 91.469020 3.418054 4.537724 1.608407 -1.479587 2.321099 88.968961 134.671788
4 USA 2019 110.461377 56.837048 9.158097 3.700762 1.293249 1.961415 -0.879640 87.279612 90.702399
... ... ... ... ... ... ... ... ... ... ... ...
195 UAE 2020 160.153711 117.170365 7.390686 2.048681 4.061737 1.635019 1.955250 62.942477 109.059291
196 UAE 2021 95.004866 72.727315 9.636214 3.880091 2.943274 1.155810 0.608064 83.550496 141.366775
197 UAE 2022 175.533469 105.083237 5.838662 4.940674 3.344420 0.037365 1.943051 61.152790 142.140959
198 UAE 2023 126.147746 94.604103 8.934185 5.975589 4.638353 2.911469 -0.766718 75.552241 82.013520
199 UAE 2024 153.743377 85.855534 9.122050 1.708365 1.051148 3.730587 -0.747705 62.137702 70.968678

200 rows × 11 columns

In [15]:
# This will filter the dataset to include only rows where Mortgage Rate is greater than 0. Select the Mortgage Rate column for those rows, then display it. 
mort = df_new[df_new['Mortgage Rate'] > 0][['Mortgage Rate']]
mort
Out[15]:
Mortgage Rate
0 4.493292
1 5.662213
2 2.197469
3 4.537724
4 3.700762
... ...
195 2.048681
196 3.880091
197 4.940674
198 5.975589
199 1.708365

200 rows × 1 columns

Because this outcome shows all 200 rows we no the are no impossible numbers in the Mortgage Rate.

In [17]:
# Prints the number of unique countries in the Country column
print("Unique countries:", df_new['Country'].nunique())
# Prints the unique years present in the Year column
print("Year range:", df_new['Year'].unique())
Unique countries: 20
Year range: [2015 2016 2017 2018 2019 2020 2021 2022 2023 2024]
In [18]:
# Defines a list of important columns to focus on
important_columns = ['House Price Index', 'Rent Index', 'Mortgage Rate', 'Affordability Ratio']
# Displays descriptive statistics  for the important columns
df_new[important_columns].describe()
Out[18]:
House Price Index Rent Index Mortgage Rate Affordability Ratio
count 200.000000 200.000000 200.000000 200.000000
mean 130.381022 83.048370 4.150621 7.237768
std 28.752229 21.439858 1.380222 2.576085
min 80.552212 50.354311 1.537814 3.041688
25% 104.142562 60.466671 3.045278 5.034207
50% 129.193653 83.721711 4.329643 7.375697
75% 157.127098 100.604665 5.217706 9.276196
max 179.971767 119.855388 6.485623 11.879671
In [19]:
# Group the DataFrame by Year and calculate the mean House Price Index for each year
price_over_time = df_new.groupby('Year')['House Price Index'].mean()
# Plot the average House Price Index over time with markers at each data point
price_over_time.plot(marker='o')
# Set the title of the plot
plt.title('Global Average House Price Index Over Time')
# Label the x-axis and y-axis, and add a grid to the plot 
plt.xlabel('Year')
plt.ylabel('House Price Index')
plt.grid(True)
# Display the plot
plt.show()
No description has been provided for this image

This line plot shows the Global Average House Price Index Over Time. Some observations to be noticed are that from 2015 to 2024, the global average house price index has generally increased, indicating rising housing costs worldwide. Then, during the COVID-19 pandemic years 2020–2021, instead of falling as would be expected during an economic recession, the average house price index rose sharply. And after the pandemic, the house price index still continued to rise.

In [21]:
# Groups each selected year from the year column into one value.  
covid_years = df_new[df_new['Year'].isin([2019, 2020, 2021, 2022])]
# Group by country and year and calculate the mean of the house price index
trend = covid_years.groupby(['Country', 'Year'])['House Price Index'].mean().reset_index()
plt.figure(figsize=(15, 6))
# Plots bar plots for each year
sns.barplot(x='Year', y='House Price Index', hue='Country', data=trend, palette='Set1', edgecolor='black', linewidth=1.5)
# Adds labels and a title shows the plot
plt.title(' House Price Index by Country during Covid-19 (2019-2022)')
plt.xlabel('Year')
plt.ylabel('Average House Price Index')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')# Show the plot
plt.show()
No description has been provided for this image

After noticing a sharp increase in the house price index during the years of the pandemic, I decided to look into it more thoroughly, looking at each country in the dataset. This Bar plot confirms that there was a sharp increase in the average House Price Index after 2020 across all countries due to the COVID-19 pandemic and the global economic recession that followed due to COVID-19. Another thing to notice is that most countries reached their peak housing prices in 2021, likely due to very low mortgage rates and lifestyle changes during the pandemic because of remote work.

In [23]:
# Creates a list of key housing factors to analyze
key_vaules = ['House Price Index', 'Rent Index', 'Affordability Ratio', 'Mortgage Rate', 'Inflation Rate', 'GDP Growth', 'Population Growth', 'Urbanization Rate', 'Construction Index']
# Makes the correlation matrix for the selected key housing factors
matrix = df_new[key_vaules].corr()
plt.figure(figsize=(12, 8))
# Plot the heatmap of the correlation matrix with annotations and a color map
sns.heatmap(matrix, annot=True, cmap='coolwarm', linewidths=0.5)
# Set the title of the heatmap then displays it
plt.title('Correlation Matrix of Key Housing Factors')
plt.xticks(rotation=45)
plt.show()
No description has been provided for this image

The correlation matrix of key housing factors shows a positive correlation between the House Price Index and Rent Index, indicating that as house prices rise, rent prices tend to follow. Negative correlations between GDP Growth and Inflation Rate suggest that higher inflation may be associated with lower economic growth. Overall, the matrix provides valuable insights into the interconnectedness of housing market factors, aiding in informed decision-making and strategy development.

In [25]:
# Groups the dataset by Country and calculates the average House Price Index for each country, then selects the top 10 countries
ave = df_new.groupby('Country')['House Price Index'].mean().sort_values(ascending=False).head(10)
# Plot a bar plot for the top 10 countries by average house price index
sns.barplot(x=ave.values, y=ave.index, hue=ave.index, palette='dark', legend=True)
plt.title('Top 10 Countries by Average House Price Index')
plt.xlabel('Average House Price Index')
plt.ylabel('Country')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
No description has been provided for this image

The bar plot showing the Top 10 Countries by Average House Price Index highlights that nations like Switzerland, Brazil, and the Netherlands consistently have the highest housing prices from 2015 to 2024. Which makes sense because these countries have a strong urban demand and limited housing supply. Overall, the results show that the highest house prices are concentrated in highly urbanized, economically strong countries with limited housing availability.

In [27]:
# Create a histogram of the Mortgage Rate column with 35 bins and a line to show density
sns.histplot(df_new['Mortgage Rate'], bins=35, kde=True, color='skyblue')
plt.title('Mortgage Rate Distribution')
plt.xlabel('Mortgage Rate')
plt.ylabel('Frequency')
plt.show()
No description has been provided for this image

The distribution of mortgage rates shows that most countries experienced moderately low mortgage rates, with the majority of rates clustered between 2% and 6%. There is a peak around 4.5% suggesting this was the most common mortgage rate during the 2015–2024 period. Overall, the distribution supports the idea that low mortgage rates played a major role in causing housing prices to increase globally during 2015-2024.

In [29]:
# Creates a scatter plot to show the relationship between the House Price Index and the Rent Index. While the points are colored by Country, with black edges around the points.
sns.scatterplot(x='House Price Index', y='Rent Index', data=df_new, hue='Country',edgecolor='black', alpha=1)
plt.title('House Price Index vs Rent Index')
plt.xlabel('House Price Index')
plt.ylabel('Rent Index')
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
No description has been provided for this image

The graph above shows a scatter plot comparing the House Price Index and Rent Index, showing a positive relationship between the two variables. What can be seen is that as house prices increase, rental prices also tend to rise. However, the data points are still pretty spread out, suggesting that while the two variables are related, the strength of the relationship varies by country.

In [31]:
# Creates an interactive scatter plot using fig.write_image("plot.png") to show the relationship between the House Price Index and the Rent Index, the points are colored by Country. 
fig = px.scatter(df_new, x='House Price Index', y='Rent Index', color='Country', title='House Price Index vs Rent Index')
# Display the interactive plot
fig.show()
fig.write_html("house_price_vs_rent.html")

This scatter plot displays the same ideas above However, I chose to show it again using a different method to make it easier to dive deeper into the points on the graph.

In [33]:
# Groups the dataset by Year and calculates the average Affordability Ratio for each year
aff_over_t = df_new.groupby('Year')['Affordability Ratio'].mean()
# Plots the average Affordability Ratio over time, using red color and circular markers at each data point
aff_over_t.plot(marker='o', color='red')
plt.title('Affordability Index Over Time')
plt.xlabel('Year')
plt.ylabel('Average Affordability Index')
plt.grid(True)
plt.show()
No description has been provided for this image

This line plot shows the Affordability Ratio over time and reveals trends and anomalies in affordability. One thing noticed is that from 2015 to 2017 houses were more affordable, but from 2018 to 2020 shows an all-time high, which further suggests that the Covid-19 years had an extreme effect on housing prices, causing them to increase and affect the market globally. However, from 2022 to 2024, we can see the rates start to stabilize again and show that the housing market is becoming predictable again.

In [35]:
# This groups the dataset by Country and calculates the average House Price Index for each country
geo_map = df_new.groupby('Country').agg({'House Price Index':'mean'}).reset_index()
# Creates an interactive choropleth map using Plotly Express to visualize the average House Price Index by country
fig = px.choropleth(geo_map, locations='Country', locationmode='country names', color='House Price Index', color_continuous_scale='plasma', title='Average House Price Index by Country')
fig.show()
# Save as an interactive HTML file
fig.write_html("average_house_price_by_country.html")

The choropleth map of the average house price index by country shows countries like the USA, Australia, and the UK have expected high house price indexes, displaying more expensive housing markets. However, countries such as Mexico and South Africa have lower indexes, suggesting more affordable housing. One surprising thing to notice about North America is the very stark difference between the USA, Canada, and Mexico. The USA shows the highest price index and I expect Canada to follow, but out of the whole of North America, they have the lower price index.

Insights:¶

  • Affordability has generally declined over time across all countries around the world.
  • Certain countries face critical affordability issues, especially urbanized and highly developed areas like the USA.
  • Rising housing costs strongly correlate with mortgage rate fluctuations and inflation and each one of these variables affects the other.
  • A limitations might include the exclusion of certain countries.
In [38]:
# Filter the DataFrame to include only rows for the countries 'USA', 'Canada', and 'Mexico'. Then creates an interactive line plot to visualize the trend of House Price Index over time for these countries
fig = px.line(df_new[df_new['Country'].isin(['USA', 'Canada', 'Mexico'])], x='Year', y='House Price Index', color='Country', title='Trend of House Price Index over Time for the USA, Canada, and UK')
fig.show()
# Save as an interactive HTML file
fig.write_html("house_price_trend_usa_canada_mexico.html")

The line plot of the House Price Index trends for the USA, Canada, and Mexico from 2015 to 2024 reveals overall rising housing prices in all three countries in North America. The USA shows a steady increase with peaks around 2020 and 2024, showing a strong increase in house prices. Canada displays more sudden changes, with fluctuations and a peak around 2019 followed by a decline and a rise. Mexico experiences sharp increases around 2018 and 2023, suggesting periods of fast increase.

In [40]:
df_new.to_csv('global_housing_market_extended (2)_cleaned.csv')
In [ ]: